Identification of outliers types in multivariate time series using genetic algorithm

نویسندگان

چکیده مقاله:

Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA model is necessary. By detecting outliers, their effect can be eliminated over time and we obtain the modified data. Using this modified data, the proper estimates of the VARMA model are obtained which have the least effect on the outliers. On the other hand, detect of outliers is important in finding an external event over time. For example, by finding outliers in river water monitoring data, flood times can be obtained. The parameter estimation of VAR model is less time consuming than VARMA. On the other hand, under condition of invertibility, VARMA models could be approximated by VAR(p) for large p. Therefore, we use this model to fit and investigate the data generated from VARMA models that contaminated by outliers. Multivariate observations of time series may be contaminated with different types of outliers. However, the effect of different types of outliers in multivariate and univariate case is different, and this observation must be assessed by multivariate approach. In this research, we use a Genetic Algorithm (GA) to develop a procedure for detecting different types of outliers (additive, innovation, level shift and temporary change outliers) in a multivariate time series. GA detects outlier location which minimizes Akaike-like Information Criterion (AIC) and we try to "minimize the number of outliers" and "maximize the likelihood function".  GA is a numerical optimization algorithm whose idea is based on natural selection and natural genetics. This algorithm does not require strong assumptions to obtain the optimal value of a function and has the ability to search for the optimal solution from a space with several local optimal. That is, for example, if a function has several relative maxima, GA finds the absolute maximum of this function as well. For minimization of a function, GA operates by first generating, at random or optionally, several minimal solutions to the function that this set of solutions called the initial population and each solution as a chromosome. Then, using reproductive operators, we combine chromosomes and make a jump into them. If the function of newly produced chromosomes is lower than the previous chromosomes, these chromosomes can be added to the initial population or replaced with chromosomes with less function in this population. This process is repeated until convergence occurs or the end number of itteration obtained. Furthermore, we introduce another method of detecting outliers, the Tsay Pena and Pankratz (TPP) method. TPP uses some test statistics based on outliers size and VAR parameters. This method detects outliers in three stages. In stage I, it detects one by one outliers and remove their effects. Iteration done until no outlier found. In stage II, for detected outlier in stage I, the estimation of outliers effects are obtained simultaneously. Then, outliers with insignificant effects are removed. The VAR parameters re-estimated based on modified series of this stage. In stage III, we repeated stage I and II with new VAR parameter estimation. In each iteration of TPP, an outlier is detected and the effect of this outlier is removed from series (modified series). Then the parameter estimation is obtained from the modified series and the next outlier detection is continued using these estimates. This may lead to biased estimates and wrong detection of the next outlier point. In other words, in the TPP method, one detected outlier hides another outlier (masking), or one detected outlier reveals the usual observation as an outlier (swamping). This method often mis-detects the type of outliers. But in each iteration of GA, a random pattern of outliers (for testing) is first generated and a temporary modified series is obtained by removing effect of this pattern from series. Then the estimation of the parameters obtained and the detection of this pattern is tested. This work reduces the effect of the previously identified outliers on the full pattern of the outliers. In fact, if the random pattern of all outliers is correctly generated, almost effect of all of them will be eliminated in the modified series. Therefore using this temporary modified series, the GA obtained more accurate estimates and detected outliers more accurately. The simulation results confirm the validity of the GA method and the percentage of correct outlier detection in this method is higher than the TPP method. GA, of course, needs more time to calculate. Also, although the VAR model is used in both detection methods, the percentage of correct outlier detection in the VARMA model data is similar to the VAR model. Gas-furnace data were analyzed and modeled and it was determined that GA and TPP methods detected similar outliers. Fitting the VAR(6) model on these data shows that the variance of input gas error in modified data of GA to TPP is reduced by 17% and the variance of carbon dioxide error in the modified data of GA to TPP reduced by 43%.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding multivariate outliers in fMRI time-series data

A fundamental challenge for researchers studying the brain is to explain how distributed patterns of brain activity relate to a specific representation or computation. Multivariate techniques are therefore becoming increasingly popular for pattern localization of functional magnetic resonance imaging (fMRI) data. The increased power of these techniques can be offset by their susceptibility to m...

متن کامل

Identification of local multivariate outliers

Abstract The Mahalanobis distance between pairs of multivariate observations is used as a measure of similarity between the observations. The theoretical distribution is derived, and the result is used for judging on the degree of isolation of an observation. In case of spatially dependent data where spatial coordinates are available, different exploratory tools are introduced for studying the ...

متن کامل

New Proposals in Multivariate Outliers Identification

Occurrences of outliers in multivariate time series are unpredictable events which may severely distort the analysis of the series. It may be noticed that a convenient way for representing multiple outliers consists in superimposing a deterministic disturbance to a Gaussian multivariate time series. Then outliers may be modelled as non – Gaussian time series components. The independent componen...

متن کامل

Identification of Multivariate Outliers: A Performance Study

Three methods for the identification of multivariate outliers (Rousseeuw and Van Zomeren, 1990; Becker and Gather, 1999; Filzmoser et al., 2005) are compared. They are based on the Mahalanobis distance that will be made resistant against outliers and model deviations by robust estimation of location and covariance. The comparison is made by means of a simulation study. Not only the case of mult...

متن کامل

MULTI-OBJECTIVE OPTIMIZATION OF TIME-COST-SAFETY USING GENETIC ALGORITHM

Safety risk management has a considerable effect on disproportionate injury rate of construction industry, project cost and both labor and public morale. On the other hand time-cost optimization (TCO) may earn a big profit for project stakeholders. This paper has addressed these issues to present a multi-objective optimization model to simultaneously optimize total time, total cost and overall ...

متن کامل

Detection of Outliers in Time Series Data

DETECTION OF OUTLIERS IN TIME SERIES DATA Samson Kiware, B.A. Marquette University, 2010 This thesis presents the detection of time series outliers. The data set used in this work is provided by the GasDay Project at Marquette University, which produces mathematical models to predict the consumption of natural gas for Local Distribution Companies (LDCs). Flow with no outliers is required to dev...

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}


عنوان ژورنال

دوره 8  شماره 4

صفحات  0- 0

تاریخ انتشار 2022-12

با دنبال کردن یک ژورنال هنگامی که شماره جدید این ژورنال منتشر می شود به شما از طریق ایمیل اطلاع داده می شود.

کلمات کلیدی

کلمات کلیدی برای این مقاله ارائه نشده است

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023